Title: Infrastructure for Performance Tuning Mpi Applications Infrastructure for Performance Tuning Mpi Applications

نویسندگان

  • Kathryn Marie Mohror
  • KATHRYN MARIE MOHROR
  • Karen L. Karavanic
  • David McClure
چکیده

An abstract of the thesis of Kathryn Marie Mohror for the Master of Science in Computer Science presented November 13, 2003. Title: Infrastructure For Performance Tuning MPI Applications Clusters of workstations are becoming increasingly popular as a low-budget alternative for supercomputing power. In these systems, message-passing is often used to allow the separate nodes to act as a single computing machine. Programmers of such systems face a daunting challenge in understanding the performance bottlenecks of their applications. This is largely due to the vast amount of performance data that is collected, and the time and expertise necessary to use traditional parallel performance tools to analyze that data. The goal of this project is to increase the level of performance tool support for message-passing application programmers on clusters of workstations. We added support for LAM/MPI into the existing parallel performance tool, Paradyn. LAM/MPI is a commonly used, freely-available implementation of the Message Passing Interface (MPI), and also includes several newer MPI features, such as dynamic process creation. In addition, we added support for non-shared filesystems into Paradyn and enhanced the existing support for the MPICH implementation of MPI. We verified that Paradyn correctly measures the performance of the majority of LAM/MPI programs on Linux clusters and show the results of those tests. In addition, we discuss MPI-2 features that are of interest to parallel performance tool developers and design support for these fea-

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Model for Performance Analysis of MPI Applications on Terascale Systems

Profiling-based performance visualization and analysis of program execution is widely used for tuning and improving the performance of parallel applications. There are several profiler-based tools for effective application performance analysis and visualization. However, a majority of these tools are not equally effective for performance tuning of applications consisting of 100’s to 10,000’s of...

متن کامل

Mpi Process Swapping: Performance Enhancement for Tightly-coupled Iterative Parallel Applications in Shared Computing Environments

MPI Process Swapping: Performance Enhancement for Tightly-coupled Iterative Parallel Applications in Shared Computing Environments by Otto K. Sievert Master of Science in Computer Science University of California, San Diego, 2003 Professor Henri Casanova, Chair Professor Francine Berman, Co-chair Simultaneous performance and ease-of-use is difficult to obtain for many parallel applications. Des...

متن کامل

Automatic Parallel Performance Analysis and Tuning for Large Clusters

This paper describes ongoing development of a performance analysis and tuning tool for parallel MPI [11] applications running on large clusters. Several parallel performance debugging tools such as VaMPIr [5], AIMS [3], and ParaVer [7] exist. Most of the existing tools provide post-mortem analysis and rely extensively on program visualization techniques to aid the user with performance bottlene...

متن کامل

Resource-Efficient, Hierarchical Auto-Tuning of a Hybrid Lattice Boltzmann Computation on the Cray XT4

We apply auto-tuning to a hybrid MPI-pthreads lattice Boltzmann computation running on the Cray XT4 at National Energy Research Scientific Computing Center (NERSC). Previous work showed that multicorespecific auto-tuning can improve the performance of lattice Boltzmann magnetohydrodynamics (LBMHD) by a factor of 4× when running on dualand quad-core Opteron dual-socket SMPs. We extend these stud...

متن کامل

Flexible collective communication tuning architecture applied to Open MPI

Collective communications are invaluable to modern high performance applications, although most users of these communication patterns do not always want to know their inner most working. The implementation of the collectives are often left to the middle-ware developer such as those providing an MPI library. As many of these libraries are designed to be both generic and portable the MPI develope...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004